VIT – Venice Italian Treebank: Syntactic and Quantitative Features
نویسندگان
چکیده
In this paper we will describe VIT (Venice Italian Treebank), created at the University of Venice. We will focus on the syntactic-semantic features and on the quantitative analysis of the data of our treebank comparing them to other treebanks. In general, we will try to substantiate the claim that treebanking grammars or parsers is dramatically dependent on the chosen treebank; and eventually this process seems to be dependent either from substantial factors such as the adopted linguistic framework for structural description or, ultimately, the described language.
منابع مشابه
Enriching the Venice Italian Treebank with Dependency and Grammatical Relations
Abstract In this paper we propose a rule-based approach to extract dependency and grammatical relations from the Venice Italian Treebank (VIT) (Delmonte et al., 2007) with bracketed tree structure. To our knowledge, the only dependency annotated corpus for Italian available is the Turin University Treebank (Lesmo et al., 2002), which has 25,000 tokens and is about 1/10 of VIT. As manual corpus ...
متن کاملFeature Engineering in Persian Dependency Parser
Dependency parser is one of the most important fundamental tools in the natural language processing, which extracts structure of sentences and determines the relations between words based on the dependency grammar. The dependency parser is proper for free order languages, such as Persian. In this paper, data-driven dependency parser has been developed with the help of phrase-structure parser fo...
متن کاملItalian Treebank lexico semantic annotation and reference lexical resource
The paper reports on the lexico semantic annotation level of the Italian Treebank the rst Italian corpus with a multi level anno tation morpho syntactic syntactic and lexico semantic The strategy of annotation and the reference lexical resource are described and the results achieved too
متن کاملA Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features
Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...
متن کاملGrammatical relation’s system in treebank annotation
The paper presents theoretical aspects and practical issues related to the development of a grammatical relation’s system for corpus annotation. The grammatical relations are arranged on a default inheritance hierarchy based on syntactic and semantic features. Preliminary tests on the annotation of an Italian treebank (the Turin University Treebank) show that the system implements a reasonable ...
متن کامل